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The coronavirus disease has spread throughout the world and its fear has made 
people to be more cautious in public places. Since precautionary measures are 
the only reliable protocol to defend ourselves, social distancing is the only 
best approach to defend against the pandemic situation. The reproduction 
number i.e. RO factor of COVID-19, can be slowed down only through the 
physical distancing norms. This research proposes a deep learning approach 
for maintaining the social distance by tracking and detecting the people 
present indoor and outdoor scenarios. Surveillance video is taken as the input 
and applied into you only look once (YOLO) V3 algorithm. The persons in 
the video are identified based on the segmentation algorithm present within 
the framework and then using Euclidean distance the image is evaluated. The 
bounding box algorithm helps to segregate the humans based on the minimum 
distance threshold. The proposed method is evaluated for images with peoples 
in the market, availing essential commodities and students entry inside a 


campus. Our proposed region-based convolutional neural network (RCNN) 
algorithm gives a better accuracy over the traditional models and hence the 
service can be implemented in general for places where social distancing is 
mandatory. 


This is an open access article under the CC BY-SA license. 


Corresponding Author: 


Atul Raj 

Department of Computer Science Engineering, Saveetha School of Engineering (SSE) 
Saveetha Institute of Medical and Technical Sciences (SIMATS), Saveetha University Chennai 
Tamilnadu, 602105 India 

Email: arulrajam1003.sse@saveetha.com 


1. INTRODUCTION 

Detection of COVID-19 for the past 2 years with the basic symptoms is a challenge. In the city of 
Wuhan, there were a lot of pneumonia cases of COVID-19 found in December 2019. The entire world is under 
the grip of the fear caused by this latest digital age demon, it spread to almost all the countries and the numbers 
of new COVID-19 cases and deaths appear in day-to-day life with many new variants. Though the initial 
formation and spread of this menace is still a subject of suspicion, the devastating effects that have been caused 
by this invisible enemy to mankind is uncountable. The global COVID-19 pandemic has perhaps created first 
of a kind of universal teachable moment for humanity, exposing the fault lines of our societal and economic 
structures and institutions and how they serve us in a moment of acute crisis. It is mandated to self-isolate, 
limit external interactions through social distancing and follow protocols defined by the World Health 
Organization (WHO). Social distancing is the best practice that aims to minimize or interrupt more spread of 
COVID-19. It also reduces the physical contact between possibly infected individuals and normal persons. 
Figure 1 shows the picture depicting the people in the market who does not maintain social distancing in Figure 
1(a) and people who maintain social distancing is presented in Figure 1(b). 
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The measures of social distancing are adopted to prevent the spread of this disease by reducing the 
occurrences of people coming into contact and maintaining a physical distance with individual persons. Most 
common symptoms were fever, cough, loss of taste and smell, headache, and red or irritated eyes. Transmission 
of the COVID-19 virus spreads more easily in crowded places, in close contact with persons, especially people 
who have communicated with each other in very near, enclosed, and confined spaces with less ventilation. 
Transmission can occur through the nose or mouth, if splashed or sprayed with contaminated fluids in the eyes, 
through droplets or aerosols, and rarely via contaminated surfaces. The most common transmission occurs 
when an infected person coughs, sneezes, or talks. 

This proposed work identifies the distance between people standing in a crowd or in any queue where 
there is a possibility of spreading diseases. WHO has given instructions to adopt social distancing as an 
effective way to mitigate the spread of the COVID Coronavirus affects many people in different situations. 


Figure 1. People in the market who (a) violate social distance and (b) maintain social distance 


2. RELATED WORK 

The paper of Das et al. [1], proposes a novel social distancing method and crowd identification with 
focus on density estimation, anomaly recognition and high-risk detection. Experimental results using the global 
nearest neighbor tracking algorithm suggest that it is providing a good accuracy. In the paper titled “Social 
distancing and face mask detection from CCTV camera,” a mask detection using openCV was proposed. It 
suggests that the system will detect face masks in photos/images using real-time videos [2]. Shalini ef al. [3] 
proposes a social distance analyzer using computer vision by evaluating a video feed obtained through a 
surveillance camera. The paper of Kumar et al. [4] proposes to detect social distancing with footage of people 
walking on the pedestrians and creates a red or green bounding box using a deep learning approach. Monitoring 
social distance under low light conditions using deep learning and motionless time of flight camera is proposed 
by Rahim et al. [5]. The work describes the risk factor involved based on the density of the violation. In the 
paper of Saponara et al. [6], implementing a real-time, Al-based, people detection and social distancing 
measuring system for COVID-19, it has been deployed with a low-cost embedded system (Jetson Nano) which 
has fixed cameras and no rotational cameras. A Fine-tuned you only look once (YOLO) v3 with deep sort 
techniques is proposed to track people who frequently violate the distancing norms. The paper proposes an 
improved regional convolutional neural network (RCNN) with feature amplification and oversampling for 
aerial images [7]. 


3. PROPOSED MODELLING 

The Theme of the work is to examine whether the people keeps the perfect social distancing in a 
friendly manner, the proposed system checks using a CCTV camera. Using YOLO V4 algorithm, the detection 
of human is implemented. YOLO V4 is a tool that separates video into a number of image frames. These frames 
were used to compute region proposals using RCNN algorithm. 


3.1. The YOLO V4 algorithm 

One of the fastest object detection methods available is you only look once (YOLO). Though it is no 
longer the most accurate object detection algorithm, it is an excellent choice when real-time object detection is 
required without sacrificing too much precision. The third version of YOLO was released with the speed which 
has been traded off for boosts in accuracy in YOLO V4. It is the real-time object detection method that 
recognizes particular objects in Indian films, real-time feeds, or photos. YOLO v4 now classifies items seen in 
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photos using multiple labels. Some authors used to apply the class scores in the early versions of YOLO and 
consider the class with the highest score value to be the class of the image encompassed in the deducted object 
of the bounding box. This was altered in YOLO v4. Figure 2 shows the splitting frames of YOLO. 

The Target detector of YOLO v4, which satisfies real-time needs for a particular issue, offers the 
benefits of detecting speed and precision. However, YOLO v4 demands high hardware performance and 
includes a lot of backbone network characteristics, which is not good for the spread of applications. In this 
proposal, the video is captured by a real-time CCTV camera, with video data acquisition the crowd is detected 
using artificial intelligence. The result is transferred to a one-stage detector which detects the people alone and 
finally, the images were split into a number of frames. 
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x Human 
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Figure 2. YOLO Architecture for splitting the frames 


3.2. The region based convolutional neural network algorithm 

The RCNN is central to deep learning and has impressive detection effects [8]. The RCNN method 
classifies the object area with trained CNNs and then determines whether it belongs to the object or background 
object area. The RCNN pipeline begins with the development of region proposals or regions in an image that 
potentially corresponds to a specific item. The selective search algorithm generates image subsegments that 
potentially belong to one object based on color, texture, size, and shape, and then repeatedly combines 
comparable regions to construct objects [9]. This generates object proposals at various scales in the images. 
The Figure 3 shows the Architecture for the detection of objects. 
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Figure 3. RCNN architecture for object detection 


3.3. The overall architecture of the proposal 

Classification and localization are two separate tasks in object detection. An RCNN is a region-based 
convolutional neural network. Region proposals are the key factor in the RCNN series. To locate objects within 
an image, the region proposals were used. The RCNN detection algorithm has two stages. The first stage selects 
a selection of image regions that may contain an object. The object is classified in each region in the second 
stage. The selective search technique is used in this RCNN architecture to create around 2,000 area ideas. These 
2,000 region suggestions are then sent into the CNN architecture, which computes CNN features. Then these 
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features are fed into an support vector machine (SVM) model, which is used to classify the given object in the 
region proposal. Figure 4 shows the overall architecture of the proposed system. 
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Figure 4. The architecture of detecting social distancing measurement using RCNN 


3.4. Implementation 

The crowd density, crowd counting, and flow estimation are some components that are focused on 
monitoring the crowd [10]. The video is captured and recorded by a CCTV camera which is given as an input. 
Using a CCTV camera the region of interest (ROI) of a video frame focused on the person who is walking is 
captured. By transforming the view frame captured into a bird’s view environment the calibration is done. Each 
person in the frame performing this transformation is considered to be standing on the bounding box technique. 
The implementation of the work is presented in Figure 3. The major idea behind YOLO [11] is to divide the 
original image into an S x S grid cell. In this cell, only one object is predicted and a fixed number of boundary 
boxes, then for each grid cell it predicts B boundary boxes using a confidence score. Next, it detects one object 
only and it predicts C conditional class probabilities, one per class. The predicted class represents the highest 
score with a distance threshold of 1 meter. Figure 5 shows the individual person detection in the system. 


Figure 5. Individual person detection with the bounding box 


The final classification of the result get from RCNN is measured by the Euclidian classifier as 
presented in Figure 2 [12]. The most common World health organization value for accepting a valid social 
distance is | meter which is set to a threshold value. The model checks whether the threshold value matches 
the individual people [13]. If it matches the model decides that the distancing is maintained or else it decides 
social distance violation occurs and the model displays the bounding box with the color red as shown in 
Figures 6 and 7. 

The deep learning algorithm can be implemented in automated systems which can be used in health 
centres [14]. Deep learning gets its name from the number of additional layers we utilize to learn from data. If 
we don't already know, when a deep learning model learns, it simply updates the weights using an optimization 
function. The process flow of our model starts with getting input from a camera as a video frame, detecting the 
crowded place, camera view calibration, distance measurement, and finally getting the result. 
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In this process, various sets of features like movements of arms, and movements of the body are 
extracted in order to get effective results. The implementation of various images and videos suggests that our 
proposed method of using RCNN for the identification of social distancing is better when compared to CNN. 
In recent years computer vision is an interdisciplinary area of the field which has been gaining more amounts 
of object detection in recent years. The RCNN approach is similar to the CNN algorithm but the difference is 
that in the RCNN detection algorithms, we draw a bounding box around the desired object to locate it within 
the original image. Also, we might not necessarily draw just one bounding box in an object detection 
environment, there could be more bounding boxes representing various objects of interest in the same image 
very faster than CNN. Some of the limitations of the proposed work are the picture quality should be very high 
and the cameras should be used for the detection of people during night times in low lighting conditions. It is 
difficult to detect the person and calculate the distance in an uncontrolled background environment. 


Figure 6. Individual person detection with social Figure 7. Violation of social distance 
distance 


4. RESULTS AND DISCUSSIONS 

Distance measurement is the most important part of the work, and therefore to find out the distance 
between two points Euclidean distance measuring algorithm is used. Lets assume that (x1, yi) and (x2, y2) are 
any two points and the distance is evaluated as d = V[(x2 — x1)? + (y2— y1)]. Based on this the research work 
takes up an area where there are 15 people. Table 1 summarizes the distance calculations and their adoption 
towards threshold value. It can be seen that among all the 15 people only 4 coordinates have violated the social 
distancing which is shown in Figure 8 and other coordinates have social distancing. Considering these two 
points P (x1, y1) and Q (x2, yz) that d is the distance between them joining P and Q by a line segment and 
constructing a right-angled triangle whose hypotenuse is PQ. For this, use horizontal and vertical lines from P 
and Q which meet at tl as given in Figure 9. 

Table 2, present the coordinates of PQC and I with their Euclidean distance to show that they have 
exceeded the threshold limit of 1 meter. The proposed work also evaluated around 10 different datasets as 
presented in Table 3, under different public places and found that the RCNN algorithm works better and detects 
violators. This study uses the framework to predict intentions of social distancing violation and past non- 
compliance during the COVID-19 pandemic. Figure 10 shows the total violated and non violated coordinates. 


Table 1. Violation of social distance among 15 people 


People X axis Y axis Social distancing 
Person | (P) 3 2 Violated 
Person 2 (A) 1 10 Non violated 
Person 3 (B) 9 2 Non violated 
Person 4 (Q) 4 1 Violated 
Person 5 (C) 4 3 Violated 
Person 6 (D) 7 7 Non violated 
Person 7 (E) 9 6 Non violated 
Person 8 (F) 12 1 Non violated 
Person 9 (G) 3 8 Non violated 
Person 10 (H) 1 6 Non violated 
Person 11 (1) 5 4 Violated 
Person 12 (J) 1 1 Non violated 
Person 13 (K) 10 10 Non violated 
Person 14 (L) 5 10 Non violated 
Person 15 (M) 12 5 Non violated 
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Figure 8. Coordinates of people standing in an area _— Figure 9. Graph showing the distance between the 
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Figure 10. Total number of violated and non-violated coordinated 


Table 2. The distance calculation of coordinates P, Q, C, and I 
Coordinates which has no social distance _ Meeting point _ Euclidean distance (in units) 


P,Q tl 2 
P,C t2 1.41 
C,I t3 1.41 
Q,C t4 2 


Table 3. Findings of violation of social distance from various video file formats 


CCTV ; Total No. of No. of persons following No. of persons violating 
: Video file name 4 as ea gi 
videos persons detected social distance social distance 
1 group.mp4 14 6 8 
2 crowd.vlc 16 10 6 
3 people_walking.avi 8 5 3 
4 walking.avi 12 8 4 
5 people_stock.fly 8 4 4 
6 people_video.mp3 7 4 3 
7 school_students.mov 15 11 4 
8 street.mp4 9 6 3 
9 play_ground.mpeg 8 >) 3 
10 market.mp3 9 6 3 


The quantile-quantile (Q-Q) plot [15], [16] implies a graphical representation of the quantiles of two 


distributions in relation to each other. It plots quantiles against quantiles to obtain the solution. When reading 
a Q-Q [17], [18] plot, 'y = x’ line is considered. The Q-Q plot is a graphical approach for detecting whether two 
data sets are from the same population. For the data shown in Table 1, the below graphs shows the plots for X 
and Y quantities in Figures 11 and 12. 


The research describes a deep learning-based system for detecting social distancing in order to reduce 


the impact of the coronavirus pandemic [19]-[21]. The detection tool was created to warn people [22]-[25] to 
keep a safe distance from each other by analyzing a video feed. 
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Figure 11. Graph showing plots for X axis Figure 12. Graph showing plots for Y axis 
quantities quantities 


5. CONCLUSION 

The objective of this study is to analyze and classify the social distancing among groups of people 
gathered in a place. Analyzing the video footage of any public place and finding the distance between people 
is a challenging task that involves machine learning algorithms for searching the people present and calculating 
the distance between them. Our proposed work is a novel approach where it helps to monitor the distancing 
norms imposed during the COVID-19 protocols in public places. The algorithm uses region-based CNN which 
seems to be better than the traditional CNN algorithm. Our approach effectively detects and identifies the 
density of the crowd and tells whether the crowd is violating the norms or not. Also, the work was tested in 
different public places like schools, markets, railway stations, and found to be very effective. This method can 
be extended with the classification of videos where there are multiple violations that occur and where norms 
need to have strictly adhered. This will ensure that the public monitoring system to prevent COVID-19 will be 
digitally enhanced and will certainly benefit society. Depending on whether or not people are adhering to social 
distancing, this proposal creates red or green bounding boxes over it. It can also recognize persons in real-time 
on web cameras, and CCTV. This could aid in the development of public space layouts or the implementation 
of preventative measures to lessen high-risk areas. it may also be utilized in areas such as driverless vehicles, 
action recognition technology, and crowd analysis. 
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